Forge JIT Backend Integration #35

da-roth · 2026-01-10T16:44:44Z

This PR integrates the Forge JIT backend for XAD, adding optional native code generation support. Forge is an optional dependency - everything builds and runs without it.

Changes

Build options added:

QLRISKS_ENABLE_FORGE: Enable Forge JIT backend
QLRISKS_ENABLE_FORGE_TESTS: Include Forge tests in test suite
QLRISKS_ENABLE_JIT_TESTS: Enable XAD JIT tests (interpreter backend, no Forge)
QLRISKS_BUILD_BENCHMARK / QLRISKS_BUILD_BENCHMARK_STANDALONE: Benchmark executables

Files added:

test-suite/jit_xad.cpp: JIT infrastructure tests (interpreter backend)
test-suite/forgebackend_xad.cpp: Forge backend tests
test-suite/swaption_jit_pipeline_xad.cpp: JIT pipeline tests for LMM Monte Carlo
test-suite/swaption_benchmark.cpp: Boost.Test benchmarks
test-suite/benchmark_main.cpp: Standalone benchmark executable
test-suite/PlatformInfo.hpp: Platform detection utilities
.github/workflows/ql-benchmarks.yaml: Benchmark workflow

Files modified:

CMakeLists.txt: Forge integration options
test-suite/CMakeLists.txt: Conditional test/benchmark targets
.github/workflows/ci.yaml: Added forge-linux and forge-windows jobs

Benchmarks

The benchmark workflow (ql-benchmarks.yaml) runs swaption pricing benchmarks comparing FD, XAD tape, JIT scalar, and JIT-AVX methods on Linux and Windows.

Also included some initial work towards #33 - the workflow has type overhead jobs that compare double vs xad::AReal pricing performance (no derivatives) on the same hardware, providing a baseline for measuring XAD type overhead.

Example benchmark run (Linux) Link

This reverts commit c84d01c.

da-roth · 2026-01-27T08:13:05Z

Hi @auto-differentiation-dev, thanks for the thorough review!

I refactored the benchmark tests quite a bit - hope I incorporated everything as wished. Building QL now 3 times (native double, XAD (JIT=OFF), XAD (JIT=ON)) and creating a combined report:

and since I saw for 100k paths that JIT=Off seemed to run into a bottleneck (I'd guess it's a memory issue since we are recording all paths and doing one adjoint call), I added XAD-Split to the JIT=Off, too. It does the same as in the JIT=On case (use XAD for bootstrap and create Jacobian and then XAD per-path). Its performance scales as expected.
Best, Daniel

da-roth · 2026-02-02T08:15:24Z

Hi @auto-differentiation-dev ,

this PR is now in a state for another round of review. I put the overhead workflow into a new PR that is checked out from this PR's branch and I'll follow up on it ( #37) once this one is merged.

Best, Daniel

auto-differentiation-dev · 2026-02-02T09:03:35Z

Hi @da-roth,

Sorry it took us some time to come back on this, and thank you for all the work - it is taking good shape.

For XAD, we think only the XAD-split mode should be reported. This is the practical way to run Monte Carlo with XAD using path-wise derivatives, and it is what we would encourage users to adopt (see https://auto-differentiation.github.io/faq/#quant-finance-applications
).

Looking at the results, we noticed a few patterns in the reported timings that seem unexpected and would be good to clarify. Overall, all methods show the expected linear behaviour - a fixed overhead plus a component that grows linearly with the number of paths. That said, the relative behaviour between methods looks unusual.

In particular:

The fixed cost for FD is extremely large (around 9s). Conceptually, this should correspond to the curve bootstrapping cost multiplied by the number of sensitivities, but even then it looks excessively high. Can you clarify what is driving this overhead?
The per-path cost of FD compared to XAD differs only by about a factor of 4, even though FD requires roughly 45 re-runs. Intuitively, we would expect a much larger gap here - how do you explain this behaviour?
Forge with AVX2 appears around 6x faster than Forge without AVX2 in terms of per-path scaling, despite AVX2 only providing 4 double-precision vector lanes. This seems stronger than what hardware vectorisation alone would suggest.
The reported curve bootstrap time with XAD of 223ms does not seem consistent with the overall trends implied by the data, where the fixed overhead looks closer to ~133ms, and Forge compilation appears to be around 120ms.

This makes us wonder whether there might be something off in how the timings are being measured or attributed. It would be helpful to understand how you interpret these trends.

Thanks again, and happy to discuss further.

fixes use original evolve for xad-split tried fixes first try added local script for fd and running benchmark locally

auto-differentiation-dev · 2026-02-03T09:01:27Z

Hi @da-roth,

Looking at the numbers again, maybe it is just a matter of increasing the Monte Carlo workload so runtime isn't dominated by bootstrapping. That would better reflect a real-world setup/application. For example, we could include a portfolio of swaptions and additional cashflows.

This reverts commit 834894e.

This reverts commit 8780440.

This reverts commit 55de855.

This reverts commit e24e207.

da-roth · 2026-02-03T10:27:57Z

Hi @auto-differentiation-dev ,

did some investigations, your remarks and intuitions were right. The QL code did some nasty re-computations of matrices during each step in the MC simulation. The code that implements this example is not that optimal - but being so, it seems to be a really good working example for future works. It shows the impact of overhead between double - AReal, and the latest results also indicate where Forge is still not that optimal.

Anyway, I did some minor optimization such that the matrix is only computed once per path and see this locally:
Timings native double FD:

Paths	Method	Mean	StdDev	Speedup
1K	FD	4820.3	10.0	---
--------+-----------+----------+----------+---------
10K	FD	6781.9	0.0	---
--------+-----------+----------+----------+---------
100K	FD	26416.9	0.0	---

Timings AReal with JIT = ON:

Paths	Method	Mean	StdDev	Setup*	Speedup
1K	XAD	180.0	0.8	---	---
	XAD-Split	143.5	2.4	109.5	1.25x
	JIT	190.7	5.0	128.7	0.94x
	JIT-AVX	143.8	0.5	128.7	1.25x
--------+-----------+----------+----------+---------+---------
10K	XAD	847.8	4.0	---	---
	XAD-Split	449.3	1.5	110.9	1.89x
	JIT	721.7	0.0	127.9	1.17x
	JIT-AVX	258.5	1.6	127.9	3.28x
--------+-----------+----------+----------+---------+---------
100K	XAD	7435.5	0.0	---	---
	XAD-Split	3463.5	0.0	110.7	2.15x
	JIT	6060.8	0.0	127.6	1.23x
	JIT-AVX	1413.2	0.0	127.6	5.26x

So AAD + AReal overhead (ofc implified by unoptimal implementation in QL) gives still a roughly 7.5x benefit for this example of XAD vs native double.

Interesting is that XAD-Split is faster than JIT - I think it shows the benefit of how XAD is optimized and the overhead of doing unnecessary computations.

The overhead of JIT vs JIT-AVX is not surprising me too much - I spent some time improving the throughput of setting inputs and getting the outputs. Hence we have something like 4 lanes + some infrastractural that can be applied to JIT scalar as well. I'll apply these in the future to the scalar JIT as well.

Let's see how the benchmarks look in the cloud. Would you wish any changes here? I really like the example since it'll give us all the insights for future improvements, but ofc one could create something that gives us higher x compared to native FD if wished (more inputs, trying to further dig if we can avoid unnecessary computations etc.). Thinking out loud my intuition is that: While I'd expect that the better XAD performs compared to native FD, I think the nearer JIT scalar will get to XAD-split (and at some point be slightly faster as we saw in the XAD repo's results).

Cheers, Daniel

This reverts commit 4a47b2b.

da-roth added 30 commits December 15, 2025 09:34

forge integration

c84d01c

Revert "forge integration"

a3605f9

This reverts commit c84d01c.

forge integration

98c276b

up

6c331dc

use setBackend

15e8808

updates due to xad-jit

b08fd88

updated backend naming

33e3d69

graph updates with jitnode

8494c87

first refactor

36eacb8

revert changes on fork

311d387

update path

2591bb5

update ci

41dc182

remove compatibility layer

a4021b2

updates

66b8ea2

update backend namespaces

7771b93

update c api workflow

b68d6f6

update benchmark yaml

7c75cd1

fix benchmark yaml

8e1ce66

update ci.yaml

f7202e4

updates

fbda9f6

updated ci.yaml

b8ca808

update ci.yaml

95e634a

updated win build

98d8b86

fix benchmark baseline

c17f541

updates

ed68306

added benchmark_utils

cf5030b

temporarily use xad-jit

8467808

up

58267e1

up

3642cfa

use double forgebackend

7cda13d

da-roth added 6 commits January 25, 2026 16:26

update ci

ba76944

update to have xad-split in xad

5775a7e

update benchmarks yaml

32964ff

updated texts

d01e3c4

update benchmark stuff

9410273

fix upstream

14cabf1

use upstream ci names

5a9cddf

da-roth mentioned this pull request Feb 2, 2026

Overhead benchmarks #37

Draft

da-roth added 4 commits February 2, 2026 11:39

tried fixes

e24e207

fixes use original evolve for xad-split tried fixes first try added local script for fd and running benchmark locally

clean locally

55de855

up

8780440

up

834894e

da-roth added 6 commits February 3, 2026 10:28

Revert "up"

9acf091

This reverts commit 834894e.

Revert "up"

a2c9376

This reverts commit 8780440.

Revert "clean locally"

efd134c

This reverts commit 55de855.

Revert "tried fixes"

8c3f726

This reverts commit e24e207.

improve matrix efficiency

57ce5ab

fix

b0ed41e

da-roth added 7 commits February 3, 2026 13:35

up

8d4bf69

added 1 mio

4a47b2b

try to use vectors

ec942a4

up

dd1d988

updates

9e6ce41

Revert "added 1 mio"

eaa23e2

This reverts commit 4a47b2b.

update aad benchmark

eccb551

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forge JIT Backend Integration #35

Forge JIT Backend Integration #35

Uh oh!

da-roth commented Jan 10, 2026 •

edited

Loading

Uh oh!

da-roth commented Jan 27, 2026 •

edited

Loading

Uh oh!

da-roth commented Feb 2, 2026

Uh oh!

auto-differentiation-dev commented Feb 2, 2026

Uh oh!

auto-differentiation-dev commented Feb 3, 2026

Uh oh!

da-roth commented Feb 3, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Forge JIT Backend Integration #35

Are you sure you want to change the base?

Forge JIT Backend Integration #35

Uh oh!

Conversation

da-roth commented Jan 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

da-roth commented Jan 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

da-roth commented Feb 2, 2026

Uh oh!

auto-differentiation-dev commented Feb 2, 2026

Uh oh!

auto-differentiation-dev commented Feb 3, 2026

Uh oh!

da-roth commented Feb 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

da-roth commented Jan 10, 2026 •

edited

Loading

da-roth commented Jan 27, 2026 •

edited

Loading

da-roth commented Feb 3, 2026 •

edited

Loading